ESA 2006: Algorithms – ESA 2006 pp 504-515 | Cite as
Inner-Product Based Wavelet Synopses for Range-Sum Queries
Abstract
In recent years wavelet based synopses were shown to be effective for approximate queries in database systems. The simplest wavelet synopses are constructed by computing the Haar transform over a vector consisting of either the raw-data or the prefix-sums of the data, and using a greedy-heuristic to select the wavelet coefficients that are kept in the synopsis. The greedy-heuristic is known to be optimal for point queries w.r.t. the mean-squared-error, but no similar efficient optimality result was known for range-sum queries, for which the effectiveness of such synopses was only shown experimentally.
We construct an operator that defines a norm that is equivalent to the mean-squared error over all possible range-sum queries, where the norm is measured on the prefix-sums vector. We show that the Haar basis (and in fact any wavelet basis) is orthogonal w.r.t. the inner product defined by this novel operator. This allows us to use Parseval-based thresholding, and thus obtain the first linear time construction of a provably optimal wavelet synopsis for range-sum queries. We show that the new thresholding is very similar to the greedy-heuristic that is based on point queries.
For the case of range-sum queries over the raw data, we define a similar operator, and show that Haar basis is not orthogonal w.r.t. the inner product defined by this operator.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 111–122 (2000)Google Scholar
- 2.Deligiannakis, A., Roussopoulos, N.: Extended wavelets for multiple measures. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 229–240 (2003)Google Scholar
- 3.Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (2002)Google Scholar
- 4.Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the 2004 ACM PODS International Conference on Management of Data, pp. 166–176 (2004)Google Scholar
- 5.Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: External Memory Algorithms. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50. American Mathematical Society (1999)Google Scholar
- 6.Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Optimal and approximate computation of summary statistics for range aggregates. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 227–236. ACM Press, New York (2001)CrossRefGoogle Scholar
- 7.Hardle, W., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation and Statistical Applications, vol. 129. Springer, New-York (1998)Google Scholar
- 8.Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 426–435 (1998)Google Scholar
- 9.Matias, Y., Portman, L.: Workload-based wavelet synopses. Technical report, Department of Computer Science, Tel Aviv University (2003)Google Scholar
- 10.Matias, Y., Portman, L.: τ-synopses: A system for run-time management of remote synopses. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 865–867. Springer, Heidelberg (2004)CrossRefGoogle Scholar
- 11.Matias, Y., Urieli, D.: On the optimality of the greedy heuristic in wavelet synopses for range queries. Technical report, Department of Computer Science, Tel-Aviv University (2004) (revised, 2005)Google Scholar
- 12.Matias, Y., Urieli, D.: Optimal workload-based weighted wavelet synopses. In: Proceedings of the 2005 ICDT conference (full version in TCS, special issue of ICDT) (January 2005)Google Scholar
- 13.Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 448–459 (June 1998)Google Scholar
- 14.Meyer, Y.: Wavelets and operators. Cambridge Studies in Advanced Mathematics, vol. 37. Cambridge University Press, Cambridge (1992), Translated from the 1990 French original by D. H. SalingerMATHGoogle Scholar
- 15.Muthukrishnan, S.: Nonuniform sparse approximation using haar wavelet basis. Technical report, DIMACS (May 2004)Google Scholar
- 16.Muthukrishnan, S., Strauss, M.: Rangesum histograms. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, pp. 233–242. Society for Industrial and Applied Mathematics (2003)Google Scholar
- 17.Portman, L.: Workload-based wavelet synopses. Master’s thesis, School of Computer Science, Tel Aviv University (2003)Google Scholar
- 18.Strauss, M.: Personal communication (October 2005)Google Scholar
- 19.Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 193–204 (June 1999)Google Scholar
- 20.Vitter, J.S., Wang, M., Iyer, B.: Data cube approximation and histograms via wavelets. In: Proceedings of Seventh International Conference on Information and Knowledge Management, pp. 96–104 (November 1998)Google Scholar
- 21.Wang, M.: Approximation and Learning Techniques in Database Systems. PhD thesis, Duke University (1999)Google Scholar