Advertisement

Practical Algorithms for Tracking Database Join Sizes

  • Sumit Ganguly
  • Deepanjan Kesh
  • Chandan Saha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3821)

Abstract

We present novel algorithms for estimating the size of the natural join of two data streams that have efficient update processing times and provide excellent quality of estimates.

Keywords

Data Stream Hash Table Frequent Item Practical Algorithm Approximate Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self- Join Sizes in Limited Storage. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Philadeplphia, Pennsylvania (May 1999)Google Scholar
  2. 2.
    Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computing STOC 1996, Philadelphia, Pennsylvania, pp. 20–29 (May 1996)Google Scholar
  3. 3.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating frequency moments. Journal of Computer Systems and Sciences 58(1), 137–147 (1998)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K.: STREAM: The Stanford Data Stream Management System. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds.) Data Stream Management Processing High-Speed Data Streams Series: Data-Centric Systems and Applications, Springer, Heidelberg (2006) ISBN: 3-540-28607-1Google Scholar
  5. 5.
    Avnur, R., Hellerstein, J.M.: Eddies: Continuously Adaptive Query Processing. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA (2000)Google Scholar
  6. 6.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proceedings of the Twentysecond ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Madison, Wisconsin, USA (2002)Google Scholar
  7. 7.
    Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.B.: Monitoring Streams - A New Class of Data Management Applications. In: Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China (2002)Google Scholar
  8. 8.
    Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata Languages and Programming (2002)Google Scholar
  9. 9.
    Cormode, G., Muthukrishnan, S.: An improved data stream summary: The Count-Min sketch and its applications. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 29–38. Springer, Heidelberg (2004) ISBN 3-540-21258- 2CrossRefGoogle Scholar
  10. 10.
    Cormode, G., Garofalakis, M.: Sketching Streams Through the Net: Distributed Approximate Query Tracking. In: Proceedings of the 31st International Conference on Very Large Data Bases (September 2005)Google Scholar
  11. 11.
    Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA (2002)Google Scholar
  12. 12.
    Ganguly, S., Garofalakis, M., Rastogi, R.: Processing Data Stream Join Aggregates using Skimmed Sketches. In: Proceedings of the Ninth International Conference on Extending Database Technology, Herkailon, Crete, Greece (March 2004)Google Scholar
  13. 13.
    Ganguly, S., Gibbons, P., Matias, Y., Silberschatz, A.: Bifocal Sampling for Skew-Resistant Join Size Estimation. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec (June 1996)Google Scholar
  14. 14.
    Hou, W.-C., Ozsoyoglu, G., Taneja, B.K.: Statistical estimators for relational algebra expressions. In: Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsylvania, March 1988, pp. 276–287 (1988)Google Scholar
  15. 15.
    Lipton, R., Naughton, J., Schneider, D.: Practical Selectivity Estimation Through Adaptive Sampling. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ (1990)Google Scholar
  16. 16.
    Thorup, M., Zhang, Y.: Tabulation based 4-universal hashing with applications to second moment estimation. In: Proceedings of the Fifteenth ACM SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana, USA, pp. 615–624 (January 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sumit Ganguly
    • 1
  • Deepanjan Kesh
    • 1
  • Chandan Saha
    • 1
  1. 1.Indian Institute of TechnologyKanpur

Personalised recommendations