Efficient two-dimensional Haar\(^+\) synopsis construction for the maximum absolute error measure
Abstract
Several wavelet synopsis construction algorithms were previously proposed for optimal Haar\(^+\) synopses. Recently, we proposed the OptExtHP-EB algorithm to find an optimal one-dimensional \(\hbox {Haar}^+\) synopsis. By utilizing the novel properties of optimal synopses, OptExtHP-EB represents the set of optimal synopses in a node of a \(\hbox {Haar}^+\) tree by a set of extended synopses. While it is much faster than the previous \(\hbox {Haar}^+\) synopsis construction algorithms, it can handle only one-dimensional data. In this paper, we propose the OptExtHP-EB2D algorithm for two-dimensional \(\hbox {Haar}^+\) synopses by extending OptExtHP-EB. While a one-dimensional \(\hbox {Haar}^+\) tree has only two child nodes and three coefficients in a node, a two-dimensional \(\hbox {Haar}^+\) tree is much more complex in that it has four child nodes and seven coefficients per node. Thus, for each possible subset of the coefficients selected in a node, we develop the efficient methods to compute a set of optimal synopses denoted by extended synopses. Our experiments confirm the effectiveness of our proposed OptExtHP-EB2D algorithm.
Keywords
Query processing Data synopses Optimal wavelet synopsis construction Approximate query answeringNotes
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1A02937186) as well as Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7063570). It was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2019R1F1A1062511).
References
- 1.Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: a multidimensional workload-aware histogram. In: ACM Sigmod Record, vol. 30, pp. 211–222. ACM (2001)Google Scholar
- 2.Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. 10(2–3), 199–223 (2001)zbMATHGoogle Scholar
- 3.Cormode, G., Garofalakis, M., Sacharidis, D.: Fast approximate wavelet tracking on streams. In: International Conference on Extending Database Technology, pp. 4–22. Springer (2006)Google Scholar
- 4.Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. ACM SIGMOD Rec. 30(2), 199–210 (2001)CrossRefGoogle Scholar
- 5.Garofalakis, M., Gibbons, P.B.: Probabilistic wavelet synopses. ACM TODS 29(1), 43–90 (2004)CrossRefGoogle Scholar
- 6.Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: PODS, pp. 166–176 (2004)Google Scholar
- 7.Garofalakis, M., Kumar, A.: Wavelet synopses for general error metrics. TODS 30(4), 888–928 (2005)CrossRefGoogle Scholar
- 8.Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decompositions of data streams. TKDE 15(3), 541–554 (2003)Google Scholar
- 9.Guha, S.: Space efficiency in synopsis construction algorithms. In: VLDB, pp. 409–420 (2005)Google Scholar
- 10.Guha, S.: On the space-time of optimal, approximate and streaming algorithms for synopsis construction problems. VLDB J. 17(6), 1509–1535 (2008)CrossRefGoogle Scholar
- 11.Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-Euclidean error. In: SIGKDD, pp. 88–97 (2005)Google Scholar
- 12.Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. Inf. Theory 54(2), 811–830 (2008)MathSciNetCrossRefGoogle Scholar
- 13.Guha, S., Park, H., Shim, K.: Wavelet synopsis for hierarchical range queries with workloads. VLDB J. 17(5), 1079–1099 (2008)CrossRefGoogle Scholar
- 14.Jestes, J., Yi, K., Li, F.: Building wavelet histograms on large data in mapreduce. PVLDB 5(2), 109–120 (2011)Google Scholar
- 15.Karras, P.: Optimality and scalability in lattice histogram construction. PVLDB 2(1), 670–681 (2009)Google Scholar
- 16.Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: VLDB, pp. 421–432 (2005)Google Scholar
- 17.Karras, P., Mamoulis, N.: The Haar+ tree: a refined synopsis data structure. In: ICDE, pp. 436–445 (2007)Google Scholar
- 18.Karras, P., Mamoulis, N.: Hierarchical synopses with optimal error guarantees. TODS 33(3), 18 (2008)CrossRefGoogle Scholar
- 19.Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: SIGKDD, pp. 380–389. ACM (2007)Google Scholar
- 20.Kim, J., Min, J.K., Shim, K.: Efficient haar+ synopsis construction for the maximum absolute error measure. PVLDB 11(1), 40–52 (2017)Google Scholar
- 21.Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: SIGMOD, vol. 27, pp. 448–459. ACM (1998)Google Scholar
- 22.Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: VLDB, pp. 101–110 (2000)Google Scholar
- 23.Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)Google Scholar
- 24.Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. In: ACM SIGMOD Record, vol. 17, pp. 28–36. ACM (1988)Google Scholar
- 25.Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: FSTTCS, pp. 285–296 (2005)Google Scholar
- 26.Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: algorithms, complexity and applications. In: International Conference on Database Theory, pp. 236–256. Springer (1999)Google Scholar
- 27.Muthukrishnan, S., Strauss, M.: Rangesum histograms. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 233–242. Society for Industrial and Applied Mathematics (2003)Google Scholar
- 28.Mytilinis, I., Tsoumakos, D., Koziris, N.: Distributed wavelet thresholding for maximum error metrics. In: SIGMOD, pp. 663–677. ACM (2016)Google Scholar
- 29.Natsev, A., Rastogi, R., Shim, K.: Walrus: a similarity retrieval algorithm for image databases. SIGMOD 28, 395–406 (1999)CrossRefGoogle Scholar
- 30.Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. VLDB 97, 486–495 (1997)Google Scholar
- 31.Reiss, F., Garofalakis, M., Hellerstein, J.M.: Compact histograms for hierarchical identifiers. In: VLDB, pp. 870–881 (2006)Google Scholar
- 32.Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: Proceedings of the 22nd International Conference on Data Engineering, 2006. ICDE’06, pp. 39–39. IEEE (2006)Google Scholar
- 33.Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 428–439. ACM (2002)Google Scholar
- 34.Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: SIGMOD, vol. 28, pp. 193–204. ACM (1999)Google Scholar