The VLDB Journal

, Volume 18, Issue 1, pp 203–231 | Cite as

Hierarchically compressed wavelet synopses

  • Dimitris Sacharidis
  • Antonios Deligiannakis
  • Timos Sellis
Regular Paper

Abstract

The wavelet decomposition is a proven tool for constructing concise synopses of large data sets that can be used to obtain fast approximate answers. Existing research studies focus on selecting an optimal set of wavelet coefficients to store so as to minimize some error metric, without however seeking to reduce the size of the wavelet coefficients themselves. In many real data sets the existence of large spikes in the data values results in many large coefficient values lying on paths of a conceptual tree structure known as the error tree. To exploit this fact, we introduce in this paper a novel compression scheme for wavelet synopses, termed hierarchically compressed wavelet synopses, that fully exploits hierarchical relationships among coefficients in order to reduce their storage. Our proposed compression scheme allows for a larger number of coefficients to be stored for a given space constraint thus resulting in increased accuracy of the produced synopsis. We propose optimal, approximate and greedy algorithms for constructing hierarchically compressed wavelet synopses that minimize the sum squared error while not exceeding a given space budget. Extensive experimental results on both synthetic and real-world data sets validate our novel compression scheme and demonstrate the effectiveness of our algorithms against existing synopsis construction algorithms.

Keywords

Wavelet synopsis Data streams Compression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baraniuk, R., Jones, D.: A signal-dependent time-frequency representation: fast algorithm for optimal kernel design. ISP 42(1), 134– (1994)Google Scholar
  2. 2.
    Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 111–122 (2000)Google Scholar
  3. 3.
    Cormode, G., Garofalakis, M., Sacharidis, D.: Fast approximate wavelet tracking on streams. In: Proceedings of the International Conference on Extending Database Technology (EDBT) (2006)Google Scholar
  4. 4.
    Deligiannakis, A., Garofalakis, M., Roussopoulos, N.: A fast approximation scheme for probabilistic wavelet synopses. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management (SSDBM) (2005)Google Scholar
  5. 5.
    Deligiannakis, A., Garofalakis, M., Roussopoulos, N.: Extended wavelets for multiple measures. ACM Trans. Database Systems 32(2) (2007)Google Scholar
  6. 6.
    Deligiannakis, A., Roussopoulos, N.: Extended wavelets for multiple measures. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 229–240 (2003)Google Scholar
  7. 7.
    Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)Google Scholar
  8. 8.
    Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 476–487 (2002)Google Scholar
  9. 9.
    Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 166–176 (2004)Google Scholar
  10. 10.
    Garofalakis, M., Kumar, A.: Wavelet synopses for general error metrics. ACM Trans. Database Systems 30(4), 888– (2005)CrossRefGoogle Scholar
  11. 11.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2001)Google Scholar
  12. 12.
    Guha, S.: Space efficiency in synopsis construction algorithms. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 409–420 (2005)Google Scholar
  13. 13.
    Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 88–97 (2005)Google Scholar
  14. 14.
    Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA) (2006)Google Scholar
  15. 15.
    Guha, S., Kim, C., Shim, K.: Xwave: Approximate extended wavelets for streaming data. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 288–299 (2004)Google Scholar
  16. 16.
    Jahangiri, M., Sacharidis, D., Shahabi, C.: Shift-Split: I/O efficient maintenance of wavelet-transformed multidimensional data. In: Proceedings of ACM International Conference on Management of Data (SIGMOD) (2005)Google Scholar
  17. 17.
    Jawerth, B., Sweldens, W.: An overview of wavelet based multiresolution analyses. SIAM Rev. 36(3), 377– (1994)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 421–432 (2005)Google Scholar
  19. 19.
    Mallat S. (1999) A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, Ney YorkMATHGoogle Scholar
  20. 20.
    Matias, Y., Urieli, D.: Inner-product based wavelet synopses for range-sum queries. In: Proceedings of the 14th Annual European Symposium on Algorithms (ESA), pp. 504–515 (2006)Google Scholar
  21. 21.
    Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 448–459 (1998)Google Scholar
  22. 22.
    Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 101–110 (2000)Google Scholar
  23. 23.
    Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: Proceedings of the IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS) (2005)Google Scholar
  24. 24.
    Natse, A., Rastogi, R., Shim, K.: WALRUS: A similarity retrieval algorithm for image databases. In: Proceedings of ACM International Conference on Management of Data (SIGMOD) (1999)Google Scholar
  25. 25.
    Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB (1997)Google Scholar
  26. 26.
    Stollnitz, E.J., Derose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann (1996)Google Scholar
  27. 27.
    Urieli, D., Matias, Y.: Optimal workload-based weighted wavelet synopses. In: Proceedings of International Conference on Database Theory (ICDT) (2005)Google Scholar
  28. 28.
    Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of ACM International Conference on Management of Data (SIGMOD, pp. 193–204. ACM Press (1999)Google Scholar
  29. 29.
    Vitter, J.S., Wang, M., Iyer, B.R.: Data cube approximation and histograms via wavelets. In: Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp. 96–104 (1998)Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Dimitris Sacharidis
    • 1
  • Antonios Deligiannakis
    • 2
  • Timos Sellis
    • 3
  1. 1.National Technical University of AthensAthensGreece
  2. 2.Technical University of CreteChaniaGreece
  3. 3.IMIS—R.C. Athena and National Technical University of AthensAthensGreece

Personalised recommendations