Advertisement

Estimating Aggregate Join Queries over Data Streams Using Discrete Cosine Transform

  • Zhewei Jiang
  • Cheng Luo
  • Wen-Chi Hou
  • Feng Yan
  • Qiang Zhu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4080)

Abstract

Data stream processing is required to be an on-line, one-pass, and time and space efficient process. In this paper, we develop a framework for estimating equi-join query size based on the cosine transform. The discrete cosine transform (DCT) is able to provide concise and accurate representations of data distributions. It can also be updated easily in the presence of insertions and deletions. We have performed analyses and experiments to compare the DCT with sketch-based methods. The experimental results show that given the same amount of space, our method yields more accurate estimates than sketch methods most of the time. Experimental results have also confirmed that the cosine series can be updated quickly to cope with the rapid flow of data.

Keywords

Data Stream Discrete Cosine Transform Storage Space Discrete Cosine Transform Coefficient Continuous Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. In: SIGMOD, pp. 275–286. ACM Press, New York (1999)Google Scholar
  2. 2.
    Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximation the Frequency Moments. In: Proc. of 28th Annual ACM STOC, May 1996, pp. 20–29 (1996)Google Scholar
  3. 3.
    Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-join Sizes in Limited Storage. In: Proc. of the 18th ACM PODS, May 1999, pp. 10–20 (1999)Google Scholar
  4. 4.
    Babu, S., Widom, J.: Continuous queries over data streams. SIGMOD Record 30(3), 109–120 (2001)CrossRefGoogle Scholar
  5. 5.
    Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: IEEE 19th ICDE, pp. 303–314 (March 2003)Google Scholar
  6. 6.
    Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data stream. In: ACM-SIGMOD, June 2002, pp. 61–72 (2002)Google Scholar
  7. 7.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: Proc. of VLDB 2001, September 2001, pp. 79–88 (2001)Google Scholar
  8. 8.
    Ioannidis, Y., Christodoulakis, S.: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM TODS 18(4), 709–748 (1993)CrossRefGoogle Scholar
  9. 9.
    Ioannidis, Y.E., Poosala, V.: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In: ACM SIGMOD, pp. 233–244 (1995)Google Scholar
  10. 10.
    Koudas, N., Muthukrishnan, S., Srivastava, D.: Optimal Histograms for Hierarchical Range Queries (Extended Abstract). In: PODS 2000, pp.196–204 (2000)Google Scholar
  11. 11.
    Lee, J.-H., Kim, D.-H., Chung, C.-W.: Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In: SIGMOD 1999, pp. 205–214 (1999)Google Scholar
  12. 12.
    Vitter, J.S., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. In: SIGMOD, pp. 193–204 (1999)Google Scholar
  13. 13.
    Wu, Y.-L., Agrawal, D., Abbadi, A.E.: Applying the Golden Rule of Sampling for Query Estimation. In: ACM SIGMOD 2001, May 2001, pp. 449–460 (2001)Google Scholar
  14. 14.
    Yan, F., Hou, W.-C., Zhu, Q.: Selectivity Estimation Using Orthogonal Series. In: 8th DASFAA, March 2003, pp. 157–164 (2003)Google Scholar
  15. 15.
    Ganguly, S., Garofalakis, M., Rastogi, R.: Processing data-stream join aggregates using skimmed sketches. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 569–586. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Briggs, W.L., Henson, V.E.: DFT: an owner’s manual for the discrete Fourier transform. Society for Industrial and Applied Mathematics Published, Philadelphia (1995)MATHGoogle Scholar
  17. 17.
    Jiang, Z., Hou, W., Feng, Y., Zhu, Q.: Estimating Aggregate Join Queries Over Data Streams Using Cosine Series, http://www.cs.siu.edu/~zjiang

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zhewei Jiang
    • 1
  • Cheng Luo
    • 1
  • Wen-Chi Hou
    • 1
  • Feng Yan
    • 1
  • Qiang Zhu
    • 2
  1. 1.Department of Computer ScienceSouthern Illinois UniversityCarbondaleUSA
  2. 2.Department of Computer and Information ScienceUniversity of MichiganDearbornUSA

Personalised recommendations